AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks. (arXiv:2201.09390v3 [cs.CV] UPDATED)
This work proposes an attention-based sequence-to-sequence model for
handwritten word recognition and explores transfer learning for data-efficient
training of HTR systems. To overcome training data scarcity, this work
leverages models pre-trained on scene text images as a starting point towards
tailoring the handwriting recognition models. ResNet feature extraction and
bidirectional LSTM-based sequence modeling stages together form an encoder. The
prediction stage consists of a decoder and a content-based attention mechanism.
The effectiveness of the proposed end-to-end HTR system has been empirically
evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The
experimental results evaluate the performance of the HTR framework, further
supported by an in-depth analysis of the error cases. Source code and
pre-trained models are available at https://github.com/dmitrijsk/AttentionHTR.
( 2
min )